An FFT-Based Companding Front End for Noise-Robust Automatic Speech Recognition
نویسندگان
چکیده
The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Recommended by Stephen Voran We describe an FFT-based companding algorithm for preprocessing speech before recognition. The algorithm mimics tone-to-tone suppression and masking in the auditory system to improve automatic speech recognition performance in noise. Moreover, it is also very computationally efficient and suited to digital implementations due to its use of the FFT. In an automotive digits recognition task with the CU-Move database recorded in real environmental noise, the algorithm improves the relative word error by 12.5% at −5 dB signal-to-noise ratio (SNR) and by 6.2% across all SNRs (−5 dB SNR to +15 dB SNR). In the Aurora-2 database recorded with artificially added noise in several environments, the algorithm improves the relative word error rate in almost all situations.
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملPerceptual Models for Speech, Audio, and Music Processing
New understandings of human auditory perception have recently contributed to advances in numerous areas related to audio, speech, and music processing. These include coding , speech and speaker recognition, synthesis, signal separation , signal enhancement, automatic content identification and retrieval, and quality estimation. Researchers continue to seek more detailed, accurate, and robust ch...
متن کاملA High-Dimensional Subband Speech Representation and SVM Framework for Robust Speech Recognition
This work proposes a novel support vector machine (SVM) based robust automatic speech recognition (ASR) frontend that operates on an ensemble of the subband components of high-dimensional acoustic waveforms. The key issues of selecting the appropriate SVM kernels for classification in frequency subbands and the combination of individual subband classifiers using ensemble methods are addressed. ...
متن کاملA Novel Front-end Based on Variable Frame Rate Analysis and Mel-filterbank Output Compensation for Robust ASR
For automatic speech recognition (ASR) systems, robustness in the presence of various types and levels of environmental noise remains an important issue, despite the various advances of recent years. This paper describes a new noise-robust ASR front-end employing a combination of variable frame rate processing based on the sample-by-sample delta energy parameter, Melfilterbank output compensati...
متن کاملRobust speech recognition in noise: an evaluation using the SPINE corpus
In this paper, methodologies for effective speech recognition are considered along with evaluations of an NRL speech in noise corpus entitled SPINE. When speech is produced in adverse conditions that include high levels of noise, workload task stress, and Lombard effect, new challenges arise concerning how to best improve recognition performance. Here, we consider tradeoffs in (i) robust featur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- EURASIP J. Audio, Speech and Music Processing
دوره 2007 شماره
صفحات -
تاریخ انتشار 2007